Introduction

The Worldwide Bureaucracy Indicators (WWBI) database is a unique cross-national dataset on public sector employment and wages that aims to fill an information gap, thereby helping researchers, development practitioners, and policymakers gain a better understanding of the personnel dimensions of state capability, the footprint of the public sector within the overall labor market, and the fiscal implications of the public sector wage bill. The dataset is derived from administrative data and household surveys, thereby complementing existing, expert perception-based approaches.

It is classified into 3 different datasets:

  1. wwbi_data : dataset containing measurements of bureaucratic indicators.

  2. wwbi_series : Contains additional information about specific indicators, such as indicator_code, which describes the unique metrics.

  3. wwbi_country : Contains country-level information, identified by country_code, which helps in enriching the dataset with country-specific attributes.

In this report, we will be exploring the primary question of:

How do public sector wage structures differ across regions and gender groups?

To further divide the main question we will be visualizing the following sub-questions:

  1. How does the wage structure vary over time across regions?
    Visualisation used : Heat Map

  2. Are there noticeable trends or divergences in male versus female wage premiums within each region?
    Visualisation used : time series line chart

  3. How does the system of trade influence the wage bill as a percentage of public expenditure across regions?
    Visualisation used : bar chart

The wwbi_full dataset is a refined and comprehensive version of the Worldwide Bureaucracy Indicators (WWBI) data, created by merging and cleaning data from multiple tables within the WWBI database.

Do note that that visualizations 2 and 3 are interactive, hovering over a point will display information about the datapoint and the graphs can be zoomed into and panned over.

Data Joins and Sources
The wwbi_full dataset (cleaned) is made by combining the three datasets mentioned before (wwbi_data, wwbi_series, wwbi_country). The wwbi_full dataset contains detailed information on economic indicators by country. Key variables used in this analysis include:

  1. year: The year of the recorded data, formatted as a date (first day of the year).

  2. country_code: The country code in uppercase format, uniquely identifying each country.

  3. region: Geographic region to which the country belongs.

  4. income_group: The income classification of the country, categorized as “Low income,” “Lower middle income,” “Upper middle income,” and “High income.” This is stored as an ordered factor to facilitate comparisons across income levels.

  5. system_of_trade: The system used for trading, formatted as a factor variable.

  6. value: The recorded value of the economic indicator for the specified year and country.

  7. short_name and long_name: The short and full names of each country.

  8. x2_alpha_code: An alternate country code.

  9. indicator_code and indicator_name: Codes and names identifying the specific economic indicator measured.

Importing Datasets and Libraries

library(tidyverse)
library(readxl)
library(lubridate)
library(stringr)
library(dplyr)
library(ggthemes)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
library(plotly)
wwbi_data <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-04-30/wwbi_data.csv')
wwbi_series <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-04-30/wwbi_series.csv')
wwbi_country <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-04-30/wwbi_country.csv')

Data Cleaning and Summary

To prepare the wwbi_full dataset for analysis, the following cleaning steps were applied:

  1. Joining and Selection:
    The primary data tables (wwbi_data, wwbi_series, and wwbi_country) were merged using common columns (country_code and indicator_code). Only relevant columns needed for analysis were retained.

  2. Data Transformation:

    • Date Formatting: The year column was converted to a Date format (January 1st of each year) to facilitate time-based analysis.
    • Factor Levels: The income_group was set as an ordered factor to allow comparisons in ascending order of income levels.
    • Standardization: The country_code column was converted to uppercase for consistency across country codes.
  3. Handling Duplicates and Missing Values:

    • Duplicates: Duplicate entries were removed to ensure unique records.
    • Missing Values: Rows containing missing values were removed. This step was justified as the presence of NA values might skew the results or cause inconsistencies in the analysis.
  4. Validation:
    After cleaning, the dataset structure was inspected to verify that all variables were formatted as expected and ready for analysis.

wwbi_full <- wwbi_data %>% 
  inner_join(wwbi_series, by = "indicator_code") %>% 
  inner_join(wwbi_country, by = "country_code") %>%
  select(year,country_code,region, income_group, system_of_trade, value, short_name, long_name, x2_alpha_code, indicator_code, indicator_name) %>% 
  mutate(year=as.Date(paste0(year,"-01-01")),income_group=factor(income_group,levels=c("Low income","Lower middle income","Upper middle income","High income"),ordered=TRUE),system_of_trade=as.factor(system_of_trade),country_code=str_to_upper(country_code)) %>% #converting variables into appropriate types
  distinct() %>% #keep only unique data
  na.omit() #remove missing data

head(wwbi_full)
## # A tibble: 6 × 11
##   year       country_code region   income_group system_of_trade value short_name
##   <date>     <chr>        <chr>    <ord>        <fct>           <dbl> <chr>     
## 1 2007-01-01 AFG          South A… Low income   General trade … 1.11  Afghanist…
## 2 2013-01-01 AFG          South A… Low income   General trade … 0.649 Afghanist…
## 3 2007-01-01 AFG          South A… Low income   General trade … 0.533 Afghanist…
## 4 2013-01-01 AFG          South A… Low income   General trade … 0.433 Afghanist…
## 5 2007-01-01 AFG          South A… Low income   General trade … 0.858 Afghanist…
## 6 2013-01-01 AFG          South A… Low income   General trade … 0.661 Afghanist…
## # ℹ 4 more variables: long_name <chr>, x2_alpha_code <chr>,
## #   indicator_code <chr>, indicator_name <chr>
str(wwbi_full)
## tibble [136,290 × 11] (S3: tbl_df/tbl/data.frame)
##  $ year           : Date[1:136290], format: "2007-01-01" "2013-01-01" ...
##  $ country_code   : chr [1:136290] "AFG" "AFG" "AFG" "AFG" ...
##  $ region         : chr [1:136290] "South Asia" "South Asia" "South Asia" "South Asia" ...
##  $ income_group   : Ord.factor w/ 4 levels "Low income"<"Lower middle income"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ system_of_trade: Factor w/ 2 levels "General trade system",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ value          : num [1:136290] 1.11 0.649 0.533 0.433 0.858 ...
##  $ short_name     : chr [1:136290] "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ long_name      : chr [1:136290] "Islamic State of Afghanistan" "Islamic State of Afghanistan" "Islamic State of Afghanistan" "Islamic State of Afghanistan" ...
##  $ x2_alpha_code  : chr [1:136290] "AF" "AF" "AF" "AF" ...
##  $ indicator_code : chr [1:136290] "BI.WAG.PRVS.FM.SM" "BI.WAG.PRVS.FM.SM" "BI.WAG.PRVS.FM.MD" "BI.WAG.PRVS.FM.MD" ...
##  $ indicator_name : chr [1:136290] "Female to male wage ratio in the private sector (using mean)" "Female to male wage ratio in the private sector (using mean)" "Female to male wage ratio in the private sector (using median)" "Female to male wage ratio in the private sector (using median)" ...
##  - attr(*, "na.action")= 'omit' Named int [1:5695] 1487 1488 1489 1490 1491 1492 1493 1494 13955 13956 ...
##   ..- attr(*, "names")= chr [1:5695] "1487" "1488" "1489" "1490" ...

Visualization 1: How does the wage structure vary over time across regions?

About the Visual

The following heatmap illustrates a global average wage structure across countries at 5-year intervals. Each panel in the heatmap represents a specific time frame, spanning from 1995 to 2020. This allows for the capturing of temporal trends, enabling comparisons of wage dynamics across both time and geography. This analysis is derived from the BI.WAG.TOTL.GD.ZS indicator, which measures wages as a percentage of GDP, offering a relative sense of how much of a country’s economic output translates into labor earnings.

In order to prepare the data for the visualization:

  1. The wage_data_summary is the summarized data aggregated by country and 5-year intervals, including average wages (avg_wage).

  2. world refers to the geographic shapefile containing country boundaries and metadata.

  3. the final joined dataset, world_wage_map, combines spatial boundaries with summarized wage data for visualization.

A heat map visualization of the average wage structure is ideal as it gives us clarity of the changes over time across countries, and their magnitude (as observed by respective intensity on the color scale).

# Load world map data
world <- ne_countries(scale = "medium", returnclass = "sf")

# Summarize the wage data by country and 5-year intervals
wage_data_summary <- wwbi_full %>%
  filter(indicator_code == "BI.WAG.TOTL.GD.ZS") %>%
  mutate(
    year_interval = cut(
      lubridate::year(year),  # Extract numeric year
      breaks = seq(1960, 2025, by = 5),  # 5-year intervals
      labels = paste(seq(1960, 2020, by = 5), seq(1965, 2025, by = 5), sep = "-"),  # Interval labels
      include.lowest = TRUE)) %>%  # Include the first year in the first interval
  group_by(country_code, year_interval) %>%
  summarize(avg_wage = mean(value, na.rm = TRUE), .groups = "drop") %>%
  ungroup()

# Join the summarized wage data with the world map data
world_wage_map <- world %>%
  left_join(wage_data_summary, by = c("iso_a3" = "country_code")) %>%
  filter(!is.na(avg_wage))  # Remove rows with NA avg_wage

# Plot the heatmaps for different 5-year intervals
ggplot(world_wage_map) +
  geom_sf(aes(fill = avg_wage, text = paste0("<b>Country :</b> ", admin, "<br>",
              "<b>Year Interval:</b> ", year_interval, "<br>", 
              "<b>Avergae wage :</b> ", avg_wage, "<br>")), color = "black") +  
  facet_wrap(~ year_interval) +  # Facet by 5-year intervals
  scale_fill_viridis_c(name = "Average Wage", direction = -1) +  # Color scale
  labs(title = "Average Wage Structure Across Countries by 5-Year Intervals",
    caption = "Source: WWBI Data") +
  theme_minimal() +
  theme(strip.text = element_text(size = 10),  # Adjust facet label size
    plot.title = element_text(hjust = 0.5),  # Center the title
    axis.text.x = element_text(angle = 90),
    legend.position = "top")

# saving the plot as a jpeg file for easy reference later
ggsave("../figures/project_visual1.jpeg")

Primary Insights, Observations and Explanation of Results

  1. Regional Wage Disparity
    The visualization brings to light the stark regional disparities in wage levels across the globe, with significant variations between high-income and low-income regions. Developed regions, such as North America and Western Europe, consistently display higher wage levels across the years, as represented by darker shades on the heatmap. Many factors contribute to sustained higher wages over time. These regions have well-established labour markets, higher productivity, and advanced economic structures, giving them a competitive edge over other lesser developed countries.
    In contrast, regions in Sub-Saharan Africa, parts of South Asia, and portions of Latin America are characterized by lighter shades, indicating significantly lower average wages. These regions often face structural challenges such as weaker industrial bases, limited access to global markets, lower productivity, and higher rates of informal employment. The persistent differences in wage levels between these regions and their developed counterparts reflect entrenched economic inequalities. These concerns have remained largely unaddressed, or shown negligible change over the years.
    The wage disparities also underline the relationship between regional development levels and labor remuneration. While high-income regions benefit from technological advancements, robust infrastructure, and effective governance, low-income regions are often constrained by limited resources, political instability, and lack of investment in human capital. This consistent wage gap raises important questions about global inequality and the mechanisms required to foster more inclusive economic growth.

  2. Temporal Trends
    Over the analyzed 5-year intervals, some regions exhibit gradual wage growth, as indicated by the increasing intensity of colors in parts of the map. For example, East Asia and certain areas in South America show consistent improvements in wage levels, signaling their progression as emerging economies. These trends are likely attributed to factors such as industrial expansion, integration into global trade networks, and targeted domestic policies aimed at improving labor conditions.
    Conversely, several low-income regions demonstrate stagnation in wage levels, with minimal changes in color intensity across intervals. Sub-Saharan Africa and parts of South Asia remain particularly vulnerable, with limited progress in economic development and labor market improvements. These regions’ stagnation highlights the challenges of breaking out of cycles of low productivity, poor infrastructure, and reliance on informal employment sectors.
    The visualization suggests that while globalization and economic reforms have spurred growth in some regions, others have not experienced the same benefits. This dichotomy underscores the uneven nature of global economic progress and the need for more targeted development strategies.

Visualization 3: How does the system of trade influence the wage bill as a percentage of public expenditure across regions?

About the Visual

The visualization utilized is a diverging bar chart to depict relation between the wage bill as a percentage of public expenditure and the regions.

In order to prepared the data for the visual:

  1. The wage_system_of_trade tibble was created, which contains the filtered data so that we can specifically look at the data corresponding to “BI.WAG.TOTL.PB.ZS” indicator_code which is “Wage bill as a percentage of Public Expenditure”.

  2. The Total variable was calculated based on grouping by region and system_of_trade, it is a summation of all the values.

The plot selected is ideal for the question as it clearly showcases the differences based on system_of_trade. Additionally, filling it by region helps one observe the segregation of the Total for each region.

wage_system_of_trade <- wwbi_full %>%
  filter(indicator_code == "BI.WAG.TOTL.PB.ZS") %>%
  group_by(region, system_of_trade) %>%
  summarise(Total = sum(value, na.rm = TRUE)) %>%
  ungroup()

visualization3 <- ggplot(wage_system_of_trade , aes(y = region, x = Total, fill = system_of_trade, text = paste0("<b>System of Trade :</b> ", system_of_trade, "<br>",                                         "<b>Rounded wage bill as % of public expenditure:</b> ", round(Total), "<br>"))) +
  geom_col(data = wage_system_of_trade %>% filter(system_of_trade == "General trade system")) +
  geom_col(data = wage_system_of_trade %>% filter(system_of_trade == "Special trade system"), aes(x = -Total)) +  
  labs(y = "", x = "Wage bill as a % of Public Expenditure", fill = "System of Trade", caption = "Source:Worldwide Bureaucracy Indicators (WWBI) dataset from the World Bank", title = "System of trade vs the wage bill(%) of public expense across regions") +
  scale_x_continuous(labels = abs) +
  scale_fill_manual(values=c("General trade system" = "maroon", "Special trade system" = "darkorange"))+
  theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 10))

ggplotly(visualization3, tooltip = "text")
# saving the plot as a jpeg file for easy reference later
ggsave("../figures/project_visual3.jpeg")

Primary Insights and Observations

  1. The relationship between wage bills and the system of trade
    There is no definitive trend that is seen that correlates between the wage bill variable and system of trade variable. This shows that there is no relation between the procedure for trade and goods and its influence on the public expenditure as a wage bill.
  2. The relationship between the type of system of trade and the region
    There are 7 regions that have been plotted for. For certain regions there is a surge in the public expenditure relating to the type of system of trade. As seen in Sub-Saharan Africa and East Asia & Pacific, where the General trade system has more wage bill as a percentage of public expenditure compared to the special trade system and the opposite is seen for Latin America & Caribbean. While South Asia and North America have opted for no special trade system procedures for goods. An interesting trend is seen for the Middle East & North America and Latin America & Caribbean where there is almost equal influence of the two trade systems set on public expenditure.
  3. When seen as an overview the influence of system_of_trade on wage bill as a percentage of public expenditure is almost equal but the different trends are seen when observed region wise.

Explanation of Results

The data patterns observed in wage bills across regions and systems of trade may appear so due to a variety of reasons, such as:

  1. Influence of Trade Systems and Regional Policies
    The type of trade system in place—whether a general or special trade system—can be shaped by regional policies that influence public expenditure. In the Middle East & North Africa and Latin America & Caribbean regions, specific trade policies help stabilize the impact of the chosen trade system on wage-related public spending. These regional policies can either increase or decrease the wage bill portion of public expenditure.
  2. Variations in Trade Systems and Cost Allocation
    Regions with more strict trade policies often incur higher wage costs, as their complex requirements demand more resources and regulatory oversight. In contrast, countries with simpler trade systems face lower wage bills due to reduced vetting and resource needs.
  3. Size of the Public Sector and Economic Structure
    The impact on wage bills also varies with the size of the public sector and economic structure of each region. For example, North America has a relatively smaller public sector, resulting in lower wage-related public spending. In contrast, regions like Sub-Saharan Africa and East Asia & Pacific have larger public sectors and rely more heavily on general trade systems, leading to higher wage bills.

Summarising overall pattern

The analysis of income structures and public sector wage distributions across multiple factors is explained using the three visualizations. Each visualization answers a specific part of the main question, helping to give a clear understanding of global income structures.

The heatmap visualization shows how average wage structures change over time across different regions, using 5-year intervals. It reveals major differences in wage levels between regions. The line plot examines how male and female wage premiums in the public sector differ across regions and change over time. It highlights several important patterns such as stable wage premiums over time, gender differences and fluctuations due to policy changes. The bar chart explores how different trade systems (general and special) affect wage bills as a percentage of public expenditure across regions. The analysis shows no universal trend linking trade systems to public wage bill, which suggests that trade systems alone do not directly determine how much governments spend on wages. Regional differences are highlighted by showcasing the balanced impact of regional trade policies and public sector structures in shaping wage bills.

Together, these visualizations provide insights into how regional, gender-based, and systemic factors affect income structures and public sector wage distributions. High-income regions consistently have higher wage levels, while emerging economies show slow improvements, and low-income regions face persistent challenges. Gender wage dynamics vary widely, reflecting cultural and economic differences across regions. Finally, the influence of trade systems on public wage expenditure is complex and varies by region, shaped by local policies and economic structures. These findings emphasize the need for focused strategies to reduce inequalities and encourage fair economic growth.

Teamwork

The data cleaning code and references was done individually by everyone and then compiled.
Vedam Akshata Jaishankar
1. Visualization 2 code
2. Visualization 2 summary
3. Rmd Compilation
Muthukrishnan Navya
1. Visualization 3 code
2. Visualization 3 summary
Jain Paridhi
1. Visualization 1 code
2. Visualization 1 summary
Mitra Reet
1. Visualization 2 code
2. Data Cleaning and Summary
3. Overall Patterns Summarized
Harihara Venkatesan Vaishnavi
1. Visualization 1 code
2. Introduction

References

  1. Data Source: Our data source is from the TidyTuesday Project, Worldwide Bureaucracy Indicators https://github.com/rfordatascience/tidytuesday/blob/master/data/2024/2024-04-30/readme.md
  2. World Development Indicators: https://datatopics.worldbank.org/world-development-indicators/
  3. World Bank. (2024). Worldwide Bureaucracy Indicators. Retrieved from the World Bank Data Catalog : https://datacatalog.worldbank.org/search/dataset/0038132/Worldwide%20Bureaucracy%20Indicators?version=3
  4. World Bank Blogs. (2024). Introducing the Worldwide Bureaucracy Indicators. : https://blogs.worldbank.org/en/developmenttalk/introducing-worldwide-bureaucracy-indicators
  5. Center for Global Development. (2024). Analyzing Public Sector Employment and Wages with the WWBI : https://www.cgdev.org/blog/three-lessons-world-banks-new-worldwide-bureaucracy-indicators-database